An Effective Combination of Different Order N-grams

نویسندگان

Sen Zhang

Na Dong

چکیده

In this paper an approach is proposed to combine different order N-grams based on the discriminative estimation criterion, on which the parameters of n-gram can be optimized. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gram language model. We employ Newton Gradient method to estimate the assumption probabilities and then test the optimally selected language model. We conduct experiments on the platform of conversion from Chinese pinyin to Chinese character. The experimental results show that the memory capacity of language model can be remarkably lowered with hide loss of accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Character n-grams and Lexical Features on Author, Gender, and Language Variety Identification on the Same Spanish News Corpus

We compare the performance of character n-gram features (n = 3–8) and lexical features (unigrams and bigrams of words), as well as their combinations, on the tasks of authorship attribution, author profiling, and discriminating between similar languages. We developed a single multi-labeled corpus for the three aforementioned tasks, composed of news articles in different varieties of Spanish. We...

متن کامل

MIRACLE's Hybrid Approach to Bilingual and Monolingual Information Retrieval

The main goal of the bilingual and monolingual participation of the MIRACLE team in CLEF 2004 was to test the effect of combination approaches on information retrieval. The starting point was a set of basic components: stemming, transformation, filtering, generation of n-grams, weighting and relevance feedback. Some of these basic components were used in different combinations and order of appl...

متن کامل

Ensemble classifier for Twitter sentiment analysis

In this paper, we present a combination of different types of sentiment analysis approaches in order to improve the individual performance of them. These ones consist of (I) ranking algorithms for scoring sentiment features as bi-grams and skip-grams extracted from annotated corpora; (II) a polarity classifier based on a deep learning algorithm; and (III) a semi-supervised system founded on the...

متن کامل

From Characters to Words to in Between: Do We Capture Morphology?

Words can be represented by composing the representations of subword units such as word segments, characters, and/or character n-grams. While such representations are effective and may capture the morphological regularities of words, they have not been systematically compared, and it is not understood how they interact with different morphological typologies. On a language modeling task, we pre...

متن کامل

Combination of different n-grams based on their different assumptions

This paper addresse the negative impact of assumptions artificially introduced from different ngram on its performance in natural language processing. To raise the power of modeling language information, we propose several schemes to combine conventional different order n-gram language model together by introducing probabilities of assumption. The assumption probabilities are estimated on the b...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

An Effective Combination of Different Order N-grams

نویسندگان

چکیده

منابع مشابه

Comparison of Character n-grams and Lexical Features on Author, Gender, and Language Variety Identification on the Same Spanish News Corpus

MIRACLE's Hybrid Approach to Bilingual and Monolingual Information Retrieval

Ensemble classifier for Twitter sentiment analysis

From Characters to Words to in Between: Do We Capture Morphology?

Combination of different n-grams based on their different assumptions

عنوان ژورنال:

اشتراک گذاری